ShRkC: Shard Rank Cutoff Prediction for Selective Search

نویسنده

  • Anagha Kulkarni
چکیده

In search environments where large document collections are partitioned into smaller subsets (shards), processing the query against only the relevant shards improves search efficiency. The problem of ranking the shards based on their estimated relevance to the query has been studied extensively. However, a related important task of identifying how many of the top ranked relevant shards should be searched for the query, so as to balance the competing objectives of effectiveness and efficiency, has not received much attention. This task of shard rank cutoff estimation is the focus of the presented work. The central premise for the proposed solution is that the number of top shards searched should be dependent on – 1. the query, 2. the given ranking of shards, and 3. on the type of search need being served (precision-oriented versus recall-oriented task). An array of features that capture these three factors are defined, and a regression model is induced based on these features to learn a queryspecific shard rank cutoff estimator. An empirical evaluation using two large datasets demonstrates that the learned shard rank cutoff estimator provides substantial improvement in search efficiency as compared to strong baselines without degrading search effectiveness.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Shard Selection for Selective Search

The Selective Search approach processes large document collections efficiently by partitioning the collection into topically homogeneous groups (shards), and searching only a few shards that are estimated to contain relevant documents for the query. The ability to identify the relevant shards for the query, directly impacts Selective Search performance. We thus investigate three new approaches ...

متن کامل

Efficient and Effective Large-scale Search

Search engine indexes for large document collections are often divided into shards that are distributed across multiple computers and searched in parallel to provide rapid interactive search. Typically, all index shards are searched for each query. For organizations with modest computing resources the high query processing cost of this exhaustive search setup can be a deterrent to working with ...

متن کامل

Balancing Precision and Recall with Selective Search

This work revisits the age-old problem of balancing search precision and recall using the promising new approach of Selective Search which partitions the document collection into topic-based shards and searches a select few shards for any query. In prior work Selective Search has demonstrated strong search precision, however, this improvement has come at the cost of search recall. In this work,...

متن کامل

Does Selective Search Benefit from WAND Optimization?

Selective search is a distributed retrieval technique that reduces the computational cost of large-scale information retrieval. By partitioning the collection into topical shards, and using a resource selection algorithm to identify a subset of shards to search, selective search allows retrieval effectiveness to be maintained while evaluating fewer postings, often resulting in 90+% reductions i...

متن کامل

Comparison of Map@500 Scores (cw09-a) for Rank-s and Taily Instances. a C R O N Y M S Csi Centralized Sample Index Wand Weighted and Xii

Selective search is a modern distributed search architecture designed to reduce the computational cost of large-scale search. Selective search creates topical shards that are deliberately contentskewed, placing highly similar documents together in the same shard. During query time, rather than searching the entire corpus, a resource selection algorithm selects a subset of the topic shards likel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015